MFQ scale from 11 MFQ items (child self-report) at 12
lcmfqt1
0 — 22
11432
12 Year
Y16: Depression (MFQ)
MFQ total scale (child behaviour questionnaire at 16)
pcbhmfqt1
0 — 26
9906
16 Year
Y21: Depression (MFQ)
MFQ overall total score (TEDS21 phase 1 twin questionnaire)
u1cmfqt1
0 — 16
9204
21 Year
Y26: Depression (MFQ)
MFQ overall total score (TEDS26 twin Mental Health Questionnaire)
zmhmfqt1
0 — 26
8306
26 Year
Y21: Anxiety (GAD-D)
General Anxiety overall total score (TEDS21 phase 2 twin questionnaire)
u2cganxt1
0 — 40
8236
21 Year
Y26: Anxiety (GAD-D)
GAD-D (General Anxiety) overall total score (TEDS26 twin Mental Health Questionnaire)
zmhganxt1
0 — 40
8022
26 Year
Y12: Externalising
SDQ Externalising scale at 12
lsdqext1
0 — 20
11389
12 Year
Y16: Externalising
SDQ Externalising scale at 16
psdqext1
0 — 20
9889
16 Year
Y21: Externalising
SDQ Externalising scale at 21
usdqext1
0 — 19
9210
21 Year
Y26: Externalising
SDQ Externalising scale at 26
zsdqext1
0 — 19
7718
26 Year
Note. MFQ = Mood and Feelings Questionnaire; SDQ = Strengths and Difficulties Questionnaire; GCSE = General Certificate of Secondary Education; KS3 = Key Stage 3; GAD-D = Generalized Anxiety Disorder scale; PARCA = Parent Report of Children's Abilities; MZ = Monozygotic; DZ = Dizygotic; HNC = Higher National Certificate; HND = Higher National Diploma; CSE = Certificate of Secondary Education.
Participation-Related plots
Missing Data at each timepoint
Family-Level Participation Metrics
Code
df%>%select(c(acontact,rq1y))%>%sapply(., function(x)length(which(x==1))/length(which(x>=0)))%>%as.data.frame()%>%rownames_to_column()%>%`colnames<-`(c("Time Point","Participation Rate"))%>%mutate(`Time Point` =factor(`Time Point`, levels =`Time Point`, labels =var_to_label(`Time Point`)))%>%ggplot(aes(x =`Participation Rate`, y =`Time Point`))+geom_col()+geom_text(aes(label =paste0(round(`Participation Rate`*100, 1), "%")), hjust =1.1, # Right align inside the bar (>1 moves it inside) color ="white", size =3.5, fontface ="bold")+# geom_vline(aes(xintercept=.2)) +theme_bw()
Warning: Using an external vector in selections was deprecated in tidyselect 1.1.0.
ℹ Please use `all_of()` or `any_of()` instead.
# Was:
data %>% select(rq1y)
# Now:
data %>% select(all_of(rq1y))
See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
Code
save_plot("11_participation_rate", width =9, height =5)df%>%select(cohort, acontact, all_of(rq1y))%>%group_by(cohort)%>%summarise(across(everything(), ~length(which(.==1))/length(which(.>=0)), .names ="{.col}"), .groups ="keep")%>%pivot_longer(cols =-cohort, names_to ="Time Point", values_to ="Participation Rate")%>%mutate( `Time Point` =factor(`Time Point`, levels =c("acontact", rq1y), labels =var_to_label(c("acontact", rq1y))), Time_numeric =as.numeric(`Time Point`), cohort =gsub(" twins", "",cohort), cohort =gsub(" to ", "-",cohort))%>%ggplot(aes(x =Time_numeric, y =`Participation Rate`, color =cohort, group =cohort))+geom_point(size =3)+geom_line(linewidth =1)+geom_text(aes(label =paste0(round(`Participation Rate`*100, 1), "%")), vjust =-0.5, hjust =0.5, size =3, show.legend =FALSE)+scale_x_continuous( breaks =1:length(c("acontact", rq1y)), labels =var_to_label(c("acontact", rq1y)))+scale_y_continuous(labels =scales::percent_format())+theme_bw()+labs(x ="Time Point", y ="Participation Rate", color ="Cohort", title ="Participation Rates Over Time by Cohort")+theme(legend.position =c(.8,.8), axis.text.x =element_text(angle =45, hjust =1))+theme(plot.margin =margin(t =1, r =1, b =1, l =10, unit ="mm"))
Warning: A numeric `legend.position` argument in `theme()` was deprecated in ggplot2
3.5.0.
ℹ Please use the `legend.position.inside` argument of `theme()` instead.
Code
save_plot("11_participation_rate_by_cohort", width =11, height =9)df%>%select(cohort, acontact, all_of(rq1y))%>%filter(btwoyear==1)%>%group_by(cohort)%>%summarise(across(everything(), ~length(which(.==1))/length(which(.>=0)), .names ="{.col}"), .groups ="keep")%>%pivot_longer(cols =-cohort, names_to ="Time Point", values_to ="Participation Rate")%>%mutate( `Time Point` =factor(`Time Point`, levels =c("acontact", rq1y), labels =var_to_label(c("acontact", rq1y))), Time_numeric =as.numeric(`Time Point`), cohort =gsub(" twins", "",cohort), cohort =gsub(" to ", "-",cohort))%>%ggplot(aes(x =Time_numeric, y =`Participation Rate`, color =cohort, group =cohort))+geom_point(size =3)+geom_line(linewidth =1)+geom_text(aes(label =paste0(round(`Participation Rate`*100, 1), "%")), vjust =-0.5, hjust =0.5, size =3, show.legend =FALSE)+scale_x_continuous( breaks =1:length(c("acontact", rq1y)), labels =var_to_label(c("acontact", rq1y)))+scale_y_continuous(labels =scales::percent_format())+theme_bw()+labs(x ="Time Point", y ="Participation Rate", color ="Cohort", title ="Participation Rates Over Time by Cohort", subtitle ="Just selecting participants that participated at YEAR 2")+theme(legend.position =c(.8,.8), axis.text.x =element_text(angle =45, hjust =1))+theme(plot.margin =margin(t =1, r =1, b =1, l =10, unit ="mm"))
Registered S3 method overwritten by 'clubSandwich':
method from
bread.mlm sandwich
Time.Point
Zygosity.Coef
Zygosity.p
Sex.Coef
Sex.p
Y4 (parent-report twin booklet)
-0.064
0.093
-0.098
0.001
Y12 (web tests)
-0.240
0.000
-0.241
0.000
Y12 (questionnaire)
-0.225
0.000
-0.176
0.000
Y16 (behaviour booklet)
-0.181
0.000
-0.338
0.000
Y18 (questionnaire)
-0.169
0.000
-0.224
0.000
Y21 (TEDS21 phase-1 questionnaire)
-0.201
0.000
-0.752
0.000
Y26 (TEDS26 questionnaire)
-0.193
0.000
-0.875
0.000
Y26 (CATSLife web tests)
-0.258
0.000
-0.790
0.000
How many different time points do we have for each participant (excluding first contact)
Code
df%>%select(c(rq1y))%>%apply(., 1, function(x)sum(x))%>%tibble(`Number Time Points Participated` =.)%>%count(`Number Time Points Participated`)%>%ggplot(aes(x =`Number Time Points Participated`, y =n))+geom_col()+geom_text(aes(label =n), vjust =1.2, color ="white", size =3.5)+theme_bw()+labs( title ="How many times has each pps been tested (exl. first contact)?", subtitle ="Twin-level participation numbers (N = 27890)")
Call:
glm(formula = formula, family = binomial, data = df)
Coefficients:
Estimate Std. Error z value
(Intercept) -0.63412 0.02676 -23.696
cohortCohort 2: twins born Sep-94 to Aug-95 0.93646 0.04510 20.766
cohortCohort 3: twins born Sep-95 to Aug-96 0.87605 0.06398 13.692
cohortCohort 4: twins born Sep-96 to Dec-96 0.72585 0.07991 9.083
aonsby1995 0.18654 0.04495 4.150
aonsby1996 0.14914 0.06514 2.290
Pr(>|z|)
(Intercept) < 2e-16 ***
cohortCohort 2: twins born Sep-94 to Aug-95 < 2e-16 ***
cohortCohort 3: twins born Sep-95 to Aug-96 < 2e-16 ***
cohortCohort 4: twins born Sep-96 to Dec-96 < 2e-16 ***
aonsby1995 3.32e-05 ***
aonsby1996 0.022 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 35954 on 26039 degrees of freedom
Residual deviance: 34729 on 26034 degrees of freedom
AIC: 34741
Number of Fisher Scoring iterations: 4
Cohort alone is not helpful for determining who was eligible at t2 and t3
As we can see below, nobody in cohort 4 took part in Y2, Y3, Y10 data collection. However, a minority of participants took part in Y2-Y3 from cohort 3, but participation rates are quite low suggesting that some participants were elibile and some not.
Scale for x is already present.
Adding another scale for x, which will replace the existing scale.
Coordinate system already present. Adding new coordinate system, which will
replace the existing one.
Scale for x is already present.
Adding another scale for x, which will replace the existing scale.
Coordinate system already present. Adding new coordinate system, which will
replace the existing one.
Code
save_plot("11_participation_pattern_grouped", width =19, height =6)df%>%select(c(rq1y))%>%rename_with(~var_to_label(., df))%>%rename_with(~clean_rq1y_label(.))%>%mutate_all(~na_if(., 0))%>%gbtoolbox::plot_pairwise_missing(textadjust =2.5)+labs(subtitle ="Number of twins tested at each time point (diagonal) and \npair of time points (below diagonal)")
Warning in gbtoolbox::plot_pairwise_missing(., textadjust = 2.5): This function is in early beta, and not yet ready for widespread use.
Proceed with caution
Female parent highest qualification level (1st Contact), see value labels
n
percent
no qualifications
1208
9.28%
CSE grade 2-5 or O-level/GCSE grade D-G
2000
15.36%
CSE grade 1 or O-level/GCSE grade A-C
4779
36.71%
A-level or S-level
1706
13.10%
HNC
359
2.76%
HND
486
3.73%
undergraduate degree
1495
11.48%
postgraduate qualification
782
6.01%
NA
205
1.57%
Relationship between maternal education and child outcomes
This section examines whether maternal education level predicts child academic and cognitive outcomes. We regress each outcome on maternal education using dummy coding (treating each education level as a separate category), which allows us to detect non-linear relationships.
Code
# Helper function to create marginal effects plotplot_education_effects=function(model, y_label){marginaleffects::predictions(model, by ="amohqual")%>%data.frame()%>%rename( prediction =estimate, education_level =amohqual)%>%ggplot(aes(y =prediction, x =education_level, group =1))+geom_errorbar(aes(ymin =conf.low, ymax =conf.high), width =0.2)+geom_point(size =2)+geom_line()+labs( x ="Maternal Education Level", y =y_label)+theme_bw()+theme( axis.text.x =element_text(angle =45, hjust =1, vjust =1))}
KS3 Academic Achievement (Age 14)
Code
model_ks3=df%>%filter(twin==1)%>%filter(!is.na(amohqual))%>%lm(npks3tall1~amohqual, data =.)summary(model_ks3)
Call:
lm(formula = npks3tall1 ~ amohqual, data = .)
Residuals:
Min 1Q Median 3Q Max
-4.8335 -0.4234 0.0387 0.4676 3.4857
Coefficients:
Estimate Std. Error t value
(Intercept) 5.23404 0.06752 77.516
amohqualCSE grade 2-5 or O-level/GCSE grade D-G 0.28023 0.08062 3.476
amohqualCSE grade 1 or O-level/GCSE grade A-C 0.53886 0.07145 7.541
amohqualA-level or S-level 0.78125 0.07623 10.248
amohqualHNC 0.66686 0.10415 6.403
amohqualHND 0.66195 0.09346 7.082
amohqualundergraduate degree 0.97547 0.07648 12.754
amohqualpostgraduate qualification 1.01003 0.08401 12.023
Pr(>|t|)
(Intercept) < 2e-16 ***
amohqualCSE grade 2-5 or O-level/GCSE grade D-G 0.000517 ***
amohqualCSE grade 1 or O-level/GCSE grade A-C 6.32e-14 ***
amohqualA-level or S-level < 2e-16 ***
amohqualHNC 1.79e-10 ***
amohqualHND 1.80e-12 ***
amohqualundergraduate degree < 2e-16 ***
amohqualpostgraduate qualification < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.7397 on 2693 degrees of freedom
(10114 observations deleted due to missingness)
Multiple R-squared: 0.1102, Adjusted R-squared: 0.1078
F-statistic: 47.63 on 7 and 2693 DF, p-value: < 2.2e-16